READ ME This script estimates seed load = a proxy for the amount of HDM seed that is hitting a given target tree. There are three pieces in the script: (1) seed production is estimated by combining DMR and crown volume estimates, (2) the proportion of a trees seed production that hits a given distance from the tree stem is estimated with a seed dispersal function and (3) pairs of source and target trees at each site are defined and interception between them is estimated. These three pieces are combined to estimate seed load.

Right now, only live Hw are considered source and target trees. Future versions could find a way to consider dead trees and Amabalis fir, the other HDM host, which was only present in significant amounts at one site (ph_2).

LIST OF POTENTIAL IMPROVEMENTS/FUTURE WORK - Is multiplying DMR x crown volume the best approach? Look at how they combine dmr and crown volume in Robinson model - Adapt dispersal function to operate at the crown third level. E.g. start exponential decay function at edge of crown at the base or middle of each crown third - Need to trial multiple versions of interception workflow that differ in the width of the path used to find intercepting trees and the factors that reduce the seed load value - Trial some analysis pathways that include dead trees - Include Amabalis fir as source trees (ph_2 site)

REFERENCES Smith, Richard B. ‘Hemlock and Larch Dwarf Mistletoe Seed Dispersal’. The Forestry Chronicle 42, no. 4 (1 December 1966): 395–401. https://doi.org/10.5558/tfc42395-4.

Bloomberg, W. J., R. B. Smith, and A. Van Der Wereld. ‘A Model of Spread and Intensification of Dwarf Mistletoe Infection in Young Western Hemlock Stands’. Canadian Journal of Forest Research 10, no. 1 (1 March 1980): 42–52. https://doi.org/10.1139/X80-008. ##############

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## here() starts at /Users/hannosoutham/Library/CloudStorage/OneDrive-UBC/Msc/R_MSc
## 
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## 
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')

Read in and format the data

## Rows: 2687 Columns: 58
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): tree_id, flag_id, site_id, spp, status, hdm_pa, b_lc, broom_pa, br...
## dbl (34): X, Y, plot_id, dist_x, dist_y, dbh, height_m, dmr_l, dmr_m, dmr_u,...
## lgl  (2): outside_10, assessed_by
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
##        X                 Y            tree_id            flag_id         
##  Min.   :1040191   Min.   :447219   Length:2687        Length:2687       
##  1st Qu.:1046831   1st Qu.:455090   Class :character   Class :character  
##  Median :1050257   Median :479307   Mode  :character   Mode  :character  
##  Mean   :1147826   Mean   :491696                                        
##  3rd Qu.:1249904   3rd Qu.:483375                                        
##  Max.   :1267956   Max.   :583751                                        
##                                                                          
##    site_id             plot_id          spp                dist_x       
##  Length:2687        Min.   :  1.0   Length:2687        Min.   :-2.5000  
##  Class :character   1st Qu.: 52.0   Class :character   1st Qu.:-1.5000  
##  Mode  :character   Median :122.0   Mode  :character   Median :-0.2000  
##                     Mean   :143.6                      Mean   :-0.0475  
##                     3rd Qu.:249.0                      3rd Qu.: 1.5000  
##                     Max.   :324.0                      Max.   : 2.5000  
##                                                        NA's   :682      
##      dist_y         status               dbh            height_m    
##  Min.   : 0.10   Length:2687        Min.   :-12.15   Min.   : 7.10  
##  1st Qu.: 6.95   Class :character   1st Qu.:  7.40   1st Qu.:20.25  
##  Median :13.40   Mode  :character   Median : 11.82   Median :24.55  
##  Mean   :15.39                      Mean   : 16.72   Mean   :25.88  
##  3rd Qu.:20.25                      3rd Qu.: 20.45   3rd Qu.:32.58  
##  Max.   :52.90                      Max.   :280.00   Max.   :45.20  
##  NA's   :1636                                        NA's   :2509   
##     hdm_pa              b_lc               dmr_l            dmr_m       
##  Length:2687        Length:2687        Min.   :0.0000   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:0.0000   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :0.0000   Median :0.0000  
##                                        Mean   :0.4442   Mean   :0.3271  
##                                        3rd Qu.:1.0000   3rd Qu.:0.0000  
##                                        Max.   :2.0000   Max.   :2.0000  
##                                        NA's   :1039     NA's   :1039    
##      dmr_u         broom_pa          broom_pos           stem_pa         
##  Min.   :0.000   Length:2687        Length:2687        Length:2687       
##  1st Qu.:0.000   Class :character   Class :character   Class :character  
##  Median :0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.179                                                           
##  3rd Qu.:0.000                                                           
##  Max.   :2.000                                                           
##  NA's   :1039                                                            
##  crown_class        dam_agent_1        dam_agent_2         path_ind_1       
##  Length:2687        Length:2687        Length:2687        Length:2687       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   path_ind_2         crown_cond           notes               dist_m      
##  Length:2687        Length:2687        Length:2687        Min.   : 0.500  
##  Class :character   Class :character   Class :character   1st Qu.: 6.028  
##  Mode  :character   Mode  :character   Mode  :character   Median :15.016  
##                                                           Mean   :19.366  
##                                                           3rd Qu.:31.616  
##                                                           Max.   :52.544  
##                                                                           
##      az_deg      outside_10     assessed_by     tree_type        
##  Min.   :  0.1   Mode:logical   Mode:logical   Length:2687       
##  1st Qu.: 85.7   NA's:2687      NA's:2687      Class :character  
##  Median :170.2                                 Mode  :character  
##  Mean   :177.0                                                   
##  3rd Qu.:265.6                                                   
##  Max.   :360.0                                                   
##  NA's   :2005                                                    
##       dmr            dmr_f              ht_corr           Dec       
##  Min.   :0.0000   Length:2687        Min.   : 6.80   Min.   :15.66  
##  1st Qu.:0.0000   Class :character   1st Qu.:19.82   1st Qu.:15.72  
##  Median :0.0000   Mode  :character   Median :23.90   Median :15.98  
##  Mean   :0.9502                      Mean   :25.16   Mean   :15.96  
##  3rd Qu.:1.0000                      3rd Qu.:31.57   3rd Qu.:16.28  
##  Max.   :6.0000                      Max.   :44.20   Max.   :16.37  
##  NA's   :1039                        NA's   :2509    NA's   :954    
##   corr_az_deg         tr_az          tr_leng         dist_y_h      
##  Min.   :  0.00   Min.   :  0.0   Min.   :15.00   Min.   : 0.0775  
##  1st Qu.: 80.83   1st Qu.: 75.0   1st Qu.:15.00   1st Qu.:12.0050  
##  Median :181.05   Median :180.0   Median :21.30   Median :22.9109  
##  Mean   :190.83   Mean   :178.8   Mean   :23.26   Mean   :24.0774  
##  3rd Qu.:299.29   3rd Qu.:305.0   3rd Qu.:30.50   3rd Qu.:36.5000  
##  Max.   :359.50   Max.   :310.0   Max.   :51.00   Max.   :52.5169  
##                   NA's   :682     NA's   :682     NA's   :682      
##    plot_x_utm        plot_y_utm       sim_tree         tree_type_2       
##  Min.   :1040197   Min.   :447215   Length:2687        Length:2687       
##  1st Qu.:1046846   1st Qu.:455083   Class :character   Class :character  
##  Median :1050223   Median :479303   Mode  :character   Mode  :character  
##  Mean   :1147828   Mean   :491695                                        
##  3rd Qu.:1249927   3rd Qu.:483357                                        
##  Max.   :1267913   Max.   :583746                                        
##                                                                          
##  crown_class_2      height_cv_est      dbh_cv_est         HT_BR       
##  Length:2687        Min.   : 6.801   Min.   : 5.083   Min.   : 4.481  
##  Class :character   1st Qu.:11.062   1st Qu.: 7.833   1st Qu.: 6.861  
##  Mode  :character   Median :15.312   Median :13.918   Median : 9.236  
##                     Mean   :17.107   Mean   :17.608   Mean   :10.239  
##                     3rd Qu.:21.536   3rd Qu.:21.669   3rd Qu.:12.713  
##                     Max.   :44.200   Max.   :90.933   Max.   :25.377  
##                     NA's   :1109     NA's   :1109     NA's   :1109    
##        CL               CR              MCW              LCW        
##  Min.   : 2.321   Min.   :0.3412   Min.   : 2.276   Min.   : 1.974  
##  1st Qu.: 4.201   1st Qu.:0.3798   1st Qu.: 2.779   1st Qu.: 2.384  
##  Median : 6.076   Median :0.3968   Median : 3.864   Median : 3.279  
##  Mean   : 6.868   Mean   :0.3929   Mean   : 4.421   Mean   : 3.677  
##  3rd Qu.: 8.822   3rd Qu.:0.4097   3rd Qu.: 5.195   3rd Qu.: 4.362  
##  Max.   :18.823   Max.   :0.4259   Max.   :14.499   Max.   :10.979  
##  NA's   :1109     NA's   :1109     NA's   :1109     NA's   :1109    
##       DACB             HLCW              CV               CV_l        
##  Min.   :0.8245   Min.   : 5.305   Min.   :  3.357   Min.   :  1.551  
##  1st Qu.:1.4924   1st Qu.: 8.353   1st Qu.:  9.755   1st Qu.:  4.506  
##  Median :2.1588   Median :11.395   Median : 24.695   Median : 11.407  
##  Mean   :2.4400   Mean   :12.678   Mean   : 59.469   Mean   : 27.469  
##  3rd Qu.:3.1343   3rd Qu.:15.848   3rd Qu.: 60.200   3rd Qu.: 27.807  
##  Max.   :6.6873   Max.   :32.064   Max.   :842.493   Max.   :389.157  
##  NA's   :1109     NA's   :1109     NA's   :1109      NA's   :1109     
##       CV_m              CV_u        
##  Min.   :  1.464   Min.   : 0.3418  
##  1st Qu.:  4.256   1st Qu.: 0.9933  
##  Median : 10.774   Median : 2.5146  
##  Mean   : 25.944   Mean   : 6.0555  
##  3rd Qu.: 26.263   3rd Qu.: 6.1299  
##  Max.   :367.549   Max.   :85.7875  
##  NA's   :1109      NA's   :1109
## tibble [2,687 × 58] (S3: tbl_df/tbl/data.frame)
##  $ X            : num [1:2687] 1043643 1043640 1043638 1043639 1043636 ...
##  $ Y            : num [1:2687] 574058 574059 574061 574064 574061 ...
##  $ tree_id      : chr [1:2687] "r62" "r63" "r64" "r65" ...
##  $ flag_id      : chr [1:2687] NA "w9" NA "w31" ...
##  $ site_id      : Factor w/ 11 levels "cr_1","cr_2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ plot_id      : int [1:2687] 166 166 166 166 166 166 166 166 166 166 ...
##  $ spp          : Factor w/ 11 levels "Ba","Bl","Cw",..: 4 7 3 7 3 3 4 8 7 7 ...
##  $ dist_x       : num [1:2687] 0.1 -1.1 -0.2 2 -2.4 -2.4 -1.5 -1.6 1.4 -0.9 ...
##  $ dist_y       : num [1:2687] 8.3 11.4 14.3 15.3 15.9 15.9 17.6 17.6 20.3 19.9 ...
##  $ status       : Factor w/ 6 levels "DS","LF","LL",..: 4 4 4 4 1 4 4 4 4 4 ...
##  $ dbh          : num [1:2687] 15.7 20.7 9.3 27.7 5 7.2 15.4 4.4 7.6 11.2 ...
##  $ height_m     : num [1:2687] NA 22.8 NA NA NA NA NA NA NA NA ...
##  $ hdm_pa       : Factor w/ 4 levels "-","N","U","Y": 1 4 1 4 1 1 1 1 4 4 ...
##  $ b_lc         : Factor w/ 3 levels "-","N","Y": 1 3 1 3 1 1 1 1 3 3 ...
##  $ dmr_l        : int [1:2687] NA 1 NA 1 NA NA NA NA 0 1 ...
##  $ dmr_m        : int [1:2687] NA 0 NA 1 NA NA NA NA 0 0 ...
##  $ dmr_u        : int [1:2687] NA 0 NA 0 NA NA NA NA 0 0 ...
##  $ broom_pa     : Factor w/ 3 levels "-","N","Y": 1 2 1 3 1 1 1 1 2 2 ...
##  $ broom_pos    : Factor w/ 8 levels "-","1","2","3",..: 1 1 1 8 1 1 1 1 1 1 ...
##  $ stem_pa      : Factor w/ 3 levels "-","N","Y": 1 3 1 2 1 1 1 1 2 2 ...
##  $ crown_class  : Factor w/ 5 levels "-","C","D","I",..: 2 2 5 2 5 5 2 4 5 4 ...
##  $ dam_agent_1  : chr [1:2687] NA NA NA NA ...
##  $ dam_agent_2  : chr [1:2687] NA NA NA NA ...
##  $ path_ind_1   : chr [1:2687] NA NA NA NA ...
##  $ path_ind_2   : chr [1:2687] NA NA NA NA ...
##  $ crown_cond   : Factor w/ 7 levels "-","1","2","3",..: 2 2 2 2 1 2 2 2 2 2 ...
##  $ notes        : chr [1:2687] NA NA NA NA ...
##  $ dist_m       : num [1:2687] 7.38 10.45 13.21 14.32 14.95 ...
##  $ az_deg       : num [1:2687] NA NA NA NA NA NA NA NA NA NA ...
##  $ outside_10   : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
##  $ assessed_by  : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
##  $ tree_type    : Factor w/ 2 levels "mature","regen": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dmr          : int [1:2687] NA 1 NA 2 NA NA NA NA 0 1 ...
##  $ dmr_f        : Factor w/ 11 levels "-","0","1","2",..: 1 3 1 4 1 1 1 1 11 3 ...
##  $ ht_corr      : num [1:2687] NA 22.3 NA NA NA NA NA NA NA NA ...
##  $ Dec          : num [1:2687] 16.4 16.4 16.4 16.4 16.4 ...
##  $ corr_az_deg  : num [1:2687] 311 304 309 318 301 ...
##  $ tr_az        : num [1:2687] 310 310 310 310 310 310 310 310 310 310 ...
##  $ tr_leng      : num [1:2687] 32.8 32.8 32.8 32.8 32.8 32.8 32.8 32.8 32.8 32.8 ...
##  $ dist_y_h     : num [1:2687] 7.38 10.39 13.2 14.18 14.76 ...
##  $ plot_x_utm   : num [1:2687] 1043649 1043649 1043649 1043649 1043649 ...
##  $ plot_y_utm   : num [1:2687] 574053 574053 574053 574053 574053 ...
##  $ sim_tree     : chr [1:2687] "N" "N" "N" "N" ...
##  $ tree_type_2  : chr [1:2687] "meas regen" "meas regen" "meas regen" "meas regen" ...
##  $ crown_class_2: chr [1:2687] "C" "C" "S" "C" ...
##  $ height_cv_est: num [1:2687] NA 22.3 NA 22.3 NA ...
##  $ dbh_cv_est   : num [1:2687] NA 21 NA 21 NA ...
##  $ HT_BR        : num [1:2687] NA 13.2 NA 13.2 NA ...
##  $ CL           : num [1:2687] NA 9.17 NA 9.17 NA ...
##  $ CR           : num [1:2687] NA 0.411 NA 0.411 NA ...
##  $ MCW          : num [1:2687] NA 5.08 NA 5.08 NA ...
##  $ LCW          : num [1:2687] NA 4.2 NA 4.2 NA ...
##  $ DACB         : num [1:2687] NA 3.26 NA 3.26 NA ...
##  $ HLCW         : num [1:2687] NA 16.4 NA 16.4 NA ...
##  $ CV           : num [1:2687] NA 60.2 NA 60.2 NA ...
##  $ CV_l         : num [1:2687] NA 27.8 NA 27.8 NA ...
##  $ CV_m         : num [1:2687] NA 26.2 NA 26.2 NA ...
##  $ CV_u         : num [1:2687] NA 6.13 NA 6.13 NA ...
## Rows: 10 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): dist_ft, n_seed_1964, n_seed_1965, n_traps, trap_size_ft2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [10 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ dist_ft      : num [1:10] 7.1 15.8 21.2 25.5 29.2 35.4 38.1 43 45 49.5
##  $ n_seed_1964  : num [1:10] 157 83 17 19 4 4 0 0 0 0
##  $ n_seed_1965  : num [1:10] 701 544 120 141 95 48 17 6 4 2
##  $ n_traps      : num [1:10] 4 8 4 8 8 12 8 8 4 4
##  $ trap_size_ft2: num [1:10] 4 4 4 4 4 4 4 4 4 4
##  - attr(*, "spec")=
##   .. cols(
##   ..   dist_ft = col_double(),
##   ..   n_seed_1964 = col_double(),
##   ..   n_seed_1965 = col_double(),
##   ..   n_traps = col_double(),
##   ..   trap_size_ft2 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

#Seed production Create a proxy metric for seed production of live Hw trees

#Subset to live Hw trees
hw <- trees %>% filter(spp == "Hw" &
                         status %in% c("LS", "LL", "LF"))

#Gather the columns that have dmr and crown volume for each crown third into 
#long format
#The result is a dataframe with three rows (corresponding to three crown thirds)
#per tree, with values of dmr and crown volume
t_dmr <- hw %>% select(tree_id, dmr_l, dmr_m, dmr_u) %>% 
  pivot_longer(cols = starts_with("dmr"), names_to = "ct", 
               names_prefix = "dmr_", values_to = "dmr_ct")
t_cv <- hw %>% select(tree_id, CV_l, CV_m, CV_u) %>% 
  pivot_longer(cols = starts_with("CV"), names_to = "ct", 
               names_prefix = "CV_", values_to = "cv_ct")
t <- t_dmr %>% left_join(t_cv, by = c("tree_id", "ct"))

#Create a proxy for seed production by crown third by multiplying dmr and 
#crown volume
t <- t %>% mutate(sp_ct = dmr_ct*cv_ct)

#Sum these for each tree and join back to original trees dataframe
sp <- t %>% group_by(tree_id) %>% 
  summarise(sp = sum(sp_ct))

trees <- trees %>% left_join(sp, by = "tree_id")

#Seed production is NA for all trees where dmr isn't defined. 
#Set it to 0 for Hw where NAs occur and keep it as NA otherwise
trees <- trees %>% 
  mutate(sp = if_else(!is.na(sp), sp, case_when(
    spp == "Hw" ~ 0,
    spp != "Hw" ~ NA)))
#Check. Looks good, only defined for Hw
trees %>% select(spp, sp) %>% 
  group_by(spp) %>% 
  summarise(n_na = sum(is.na(sp)),
            mean = mean(sp))
## # A tibble: 11 × 3
##    spp    n_na  mean
##    <fct> <int> <dbl>
##  1 Ba       61  NA  
##  2 Bl        1  NA  
##  3 Cw      374  NA  
##  4 Dr       85  NA  
##  5 Ep       19  NA  
##  6 Fd      120  NA  
##  7 Hw        0  43.8
##  8 Mb        6  NA  
##  9 Mv       43  NA  
## 10 U         6  NA  
## 11 V        19  NA

#Seed dispersal Define function for seed dispersal

#Goal is to create a function that models the portion of seed originating from
#a tree crown that is deposited a given distance away
#The proportion of seed curve is modeled in two intervals - (1) from the crown
#base to the edge of the crown (=dripline) and (2) beyond the crown. It if fit 
#to match from Smith (1966), who measured seed deposition as a function of 
#distance from a tree stem. Within the dripline, the proportion of seed is 
#assumed constant - approx. 38% of the total seed production value, 
#irrespective of where you are in the interval. Beyond the dripline, the 
#proportion of seed follows an exponential decay function. 
#################################

#Starting with the Smith (1966) data
head(smith)
## # A tibble: 6 × 5
##   dist_ft n_seed_1964 n_seed_1965 n_traps trap_size_ft2
##     <dbl>       <dbl>       <dbl>   <dbl>         <dbl>
## 1     7.1         157         701       4             4
## 2    15.8          83         544       8             4
## 3    21.2          17         120       4             4
## 4    25.5          19         141       8             4
## 5    29.2           4          95       8             4
## 6    35.4           4          48      12             4
str(smith)
## spc_tbl_ [10 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ dist_ft      : num [1:10] 7.1 15.8 21.2 25.5 29.2 35.4 38.1 43 45 49.5
##  $ n_seed_1964  : num [1:10] 157 83 17 19 4 4 0 0 0 0
##  $ n_seed_1965  : num [1:10] 701 544 120 141 95 48 17 6 4 2
##  $ n_traps      : num [1:10] 4 8 4 8 8 12 8 8 4 4
##  $ trap_size_ft2: num [1:10] 4 4 4 4 4 4 4 4 4 4
##  - attr(*, "spec")=
##   .. cols(
##   ..   dist_ft = col_double(),
##   ..   n_seed_1964 = col_double(),
##   ..   n_seed_1965 = col_double(),
##   ..   n_traps = col_double(),
##   ..   trap_size_ft2 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
#Lengthen data from the two years in the measurements:
smith <- pivot_longer(smith, cols = starts_with("n_seed"), 
                      names_to = "yr", names_prefix = "n_seed_",
                      values_to = "n_seed")

#Convert numbers that are in feet to m
#1 ft = 0.3048m; 1ft^2 = 0.092903
smith <- smith %>% mutate(dist_m = dist_ft*0.3048,
                          trap_size_m2 = trap_size_ft2*0.092903)

#Different numbers of seed traps were put out at different distances from the
#tree so the "survey effort" was different. Convert raw number of seed data 
#too seeds/square m so its standardized
smith <- smith %>% mutate(tot_area_m2 = n_traps*trap_size_m2,
                          tot_area_ft2 = n_traps*trap_size_ft2,
                          seed_m2 = n_seed/tot_area_m2)

#Scale by circumference of circle at distance of trap from stem of tree
#A circle with radius of 10m has a larger circumference than one of with a 
#radius of 5m. So an the proportion of seeds from the source tree at a given 
#distance can't be compared to another distance without scaling. 
smith <- smith %>% mutate(seed_m2_sc = seed_m2*(2*pi*dist_m))

#Plot it
ggplot(smith, aes(x=dist_m, y =seed_m2_sc, colour = yr)) + geom_point()

#Transform data to a proportion of the total seed in a year to compare 
#relative amounts at a given distance
yr_tot <- smith %>% group_by(yr) %>% 
  summarise(seed_m2_sc_tot = sum(seed_m2_sc))
smith <- left_join(smith, yr_tot, by = "yr")
smith <- smith %>% 
  mutate(p_seed = seed_m2_sc/seed_m2_sc_tot)

#Plot it:
ggplot(smith, aes(x=dist_m, y = p_seed, colour = yr)) + geom_point()

#Okay, first distance point (7.1ft; 2.1604m) were traps set up underneath the 
#crown. Use that to estimate proportion of seed that falls within the 
#dripline of the tree. 
#Then use exponential decay function to model from the dripline outward

#Exponential decay function: N(x) = N0 * e^(-lamba*x)
##N(t) is the amount of something at x
##N0 is the initial amount of something
##x is the variable over which change occurs, usually time but in our case 
##distance
##lamda is the constant that controld the rate of decay. Bigger lambda, decays
##faster. 

#Summarize the mean of the two years
smith2 <- smith %>% 
  group_by(dist_m) %>% 
  summarise(p_seed_mean = mean(p_seed))

#Plot averaged points
ggplot(smith2, aes(x=dist_m, y = p_seed_mean)) + geom_point()

#Save proportion from first distance point. This will be constant used for 
#all trees within the dripline of a source tree
p_seed_dl <- smith2$p_seed_mean[1]

#Define new distance variable, zeroed on the first point. In this variable, the
#first observation is dist = 0 = edge of the crown. We will use this to fit the
#exponential decay function. 
dist_pt1 <- smith2$dist_m[1]
smith2 <- smith2 %>% 
  mutate(dist_m_ed = dist_m - dist_pt1)

#Solve for lambda
smith2 <- smith2 %>% 
  mutate(lambda = -log(p_seed_mean/p_seed_dl)/dist_m_ed)
l1 <- mean(smith2$lambda, na.rm = T)

#Calculate the fitted exponential decay
smith2 <- smith2 %>% 
  mutate(p_seed_ed_l1 = p_seed_dl*exp(-l1*dist_m_ed))

#Plot to check
#Looks okay. Smith data shoes almost linear decline. Function underestimates at
#intermediate values. 
smith3 <- smith2 %>% 
  pivot_longer(cols = starts_with("p_seed_"),
               names_to = "source",
               names_prefix = "p_seed_",
               values_to = "p_seed")
smith3 <- smith3 %>% 
  mutate(source = if_else(source == "mean", "smith 1966", source))
ggplot(smith3, aes(x=dist_m_ed, y = p_seed, colour = source)) + 
  geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

#Try adjusting lambda
#Use l2 as a compromise
l2 <- 0.25
l3 <- 0.20
smith2 <- smith2 %>% 
  mutate(p_seed_ed_l2 = p_seed_dl*exp(-l2*dist_m_ed),
         p_seed_ed_l3 = p_seed_dl*exp(-l3*dist_m_ed))
smith3 <- smith2 %>% 
  pivot_longer(cols = starts_with("p_seed_"),
               names_to = "source",
               names_prefix = "p_seed_",
               values_to = "p_seed")
smith3 <- smith3 %>% 
  mutate(source = if_else(source == "mean", "smith 1966", source))
ggplot(smith3, aes(x=dist_m_ed, y = p_seed, colour = source)) + 
  geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

#Use l2 as a compromise

#Define a function to be applied to hdm data
##sl: seed load (the output variable of the function)
##sp: seed production of the source tree in a given tree pair. This is 
##calculated above for each tree. 
##dist_m_ed: distance between a tree pair zeroed on the crown width of the 
##source tree (e.g. if the largest crown width (LCW) of the sources tree is 
## 5m and the distance between the tree pair is 10m, dist_m_ed = 10-5)
f_p_seed <- function(dist_m_ed){
  #[1] Define seed load in first interval: between tree stem and crown edge
  if(dist_m_ed <= 0) {
    p_seed <- p_seed_dl
  } 
  
  #[2] Define seed load in second interval: beyond crown edge
  #Define seed load as an exponential decay, controlled by lambda (l)
  if(dist_m_ed > 0) {
    p_seed <- p_seed_dl*exp(-l2*dist_m_ed)
  }
  
  return(p_seed)
}

#Interception function Workflow generally looks like: (1) find all pairs of Hw trees within 25m of each other, (2) separate into regen-mature and regen-regen pairs, (3) draw lines, then polygons between them to approximate the path of a seed, (4) intersect those paths with the rest of the trees at a site and (5) count the number of intersecting trees according to some rules to approximate interception.

Start by defining a small test area within one of the sites. This is wrapped into a function in the next block. Interception estimates come from Bloomberg et al. (1980).

#We are primarily interested in the seed load from mature trees on regen trees.
#Seed load between regen trees is also included
#Seed load from regen trees on mature trees isn't considered (and is 
#probably negligible)

#Some concrete definitions/rules:
#What's what:
## tree1 = target tree
## tree2 = source tree

#Pairs we are going to consider:
## [tree1 = regen tree, tree2 = mature tree]
## [tree1 = regen, tree2 = regen tree]

#How is interception calculated
## Only trees in the same component can intercept;
## Crown_class_2 is used to define interception rules;
## Factors are the proportion of seed production blocked by an intercepting 
## tree. Bloomberg et al. (1980) estimates 90% of seed is blocked by an 
## intervening tree in the same canopy class. So we are going to use 0.9 for 
## trees in the same canopy class and 0.45 for trees in a lower canopy class
## as a starting place. The functions are defined with named variables 
## (icpt_f1 and icpt_f2) so that we can test different factors to see what 
## gives the best predictions.

#FACTOR RULES
## For tree1=regen, tree2=mature:
### if tree2 is C, C and I (but to a lesser extent) can intercept
##### C factor = 0.9, I factor = 0.45
### if tree2 is I, C, I and S (but to a lesser extent) can intercept
##### C factor = 0.9, I factor = 0.9, S factor = 0.45
### if tree2 is S, I and S can intercept
##### I factor = 0.9, S factor = 0.9

## For tree1=regen, tree2=regen
### if tree2 is C, C and I (but to a lesser extent) can intercept
##### C factor = 0.9, I = 0.45
### if tree2 is I, C, I and S (but to a lesser extent) can intercept
##### C factor = 0.9, I factor = 0.9, S factor = 0.45
### if tree2 is S, C, I and S can intercept
##### C factor = 0.9, I factor = 0.9, S factor = 0.9

#How dead trees are considered:
##In first iteration, they aren't. Dead trees removed from seed load 
##calculations. 

#Convert trees object to a spatial (sf) object
#Coordinates are called X and Y
#CRS BC Albers NAD83 = 3005
trees_sf <- st_as_sf(trees, coords = c("X", "Y"), crs = 3005)
#check CRS
st_crs(trees_sf)
## Coordinate Reference System:
##   User input: EPSG:3005 
##   wkt:
## PROJCRS["NAD83 / BC Albers",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["British Columbia Albers",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",45,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-126,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",50,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",58.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",1000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Province-wide spatial data management."],
##         AREA["Canada - British Columbia."],
##         BBOX[48.25,-139.04,60.01,-114.08]],
##     ID["EPSG",3005]]
#Filter the mature trees from the dataset that were beyond the 10m boundary 
#used across all sites. 
trees_sf <- trees_sf %>% filter(outside_10 == "N" | is.na(outside_10))

#Pull out a small area from one site to use as a test piece
#Start by subsetting to a site and plotting it
mi_1 <- trees_sf %>% filter(site_id == "mi_1")
tmap_mode("plot") + tm_shape(mi_1) + 
  tm_symbols() + tm_text("tree_id") + tm_grid()
## tmap mode set to plotting

#Create the dataset by filtering based on x and y coordinates
test <- mi_1 %>% 
  mutate(x_utm = st_coordinates(.)[,1],
         y_utm = st_coordinates(.)[,2]) %>% 
  filter((x_utm < 1264055 & x_utm > 1264030) & 
           (y_utm < 476900 & y_utm > 476875))

#Plot so we can see what we are working with
tmap_mode("plot") + tm_shape(test) + 
  tm_symbols(col = "spp", shape = "tree_type") + tm_text("tree_id") + 
  tm_grid()
## tmap mode set to plotting

tmap_mode("plot") + tm_shape(test) + 
  tm_symbols(col = "crown_class_2", shape = "tree_type") + 
  tm_text("tree_id") + tm_grid()
## tmap mode set to plotting

#Subset the dataset to live Hw trees
hw <- test %>% filter(spp == "Hw" &
                           status %in% c("LS", "LF", "LL"))
dim(hw) #Number of pairs is number of rows^2
## [1] 26 60
#Save row number as a variable
hw$row <- as.integer(row.names(hw))

#Calculate the distance between each tree pair
dim(hw) #26 trees, unique pairs = 26^2 = 676
## [1] 26 61
dist_matrix <- st_distance(hw, by_element = FALSE)
dim(dist_matrix)
## [1] 26 26
#Use which() function to test which are <25m and get indices 
#(row/column numbers) for those. Then turn this into a dataframe - column 1 
#specifies row # (corresponds to row # in the site level dataframe), column 2 
#specifies the column # (also corresponds to row # in the site level 
#dataframe) and column 3 specifies the distance value.
indices <- which(dist_matrix < units::set_units(25, "m"), arr.ind = TRUE)
pairs <- as.data.frame(indices)
pairs$dist_m <- as.numeric(dist_matrix[indices])
dim(pairs) #676 pairs (all Hw in test area are within 25m of each other here)
## [1] 676   3
#Rename row and col in the pairs df. These correspond to the row 
#number of tree1 and tree2 in the site level dataframe respectively
pairs <- pairs %>% 
  rename(t1_row = row, t2_row = col)

#At this point, the dataframe has a row for each pair, with the distance value
#and the ids of each tree
#Add attributes of each tree to the dataframe of pairs
##tree_id, tree_type, status, species, crown class, plot_id (corresponding to 
## transect_id for regen trees) and sp (seed production, only for source trees)
x <- hw %>% st_drop_geometry()
pairs <- left_join(pairs,
              select(x, row, tree_id, plot_id, tree_type, status, spp,
                     crown_class_2),
              by = join_by(t1_row == row)) %>% 
    rename(tree1 = tree_id, t1_pid = plot_id, t1_st = status, t1_spp = spp, 
           t1_tt = tree_type, t1_cc = crown_class_2)
pairs <- left_join(pairs,
              select(x, row, tree_id, plot_id, tree_type, status, spp, 
                     crown_class_2, sp),
              by = join_by(t2_row == row)) %>% 
    rename(tree2 = tree_id, t2_pid = plot_id, t2_st = status,t2_spp = spp, 
           t2_tt = tree_type, t2_cc = crown_class_2)

#Add a variable that identifies pair type
pairs <- pairs %>% 
    mutate(pair_type = case_when(
      (t1_tt == "regen" & t2_tt == "regen") ~ "r-r",
      (t1_tt == "regen" & t2_tt == "mature") ~ "r-m",
      (t1_tt == "mature" & t2_tt == "mature") ~ "m-m"))

#Filter based on the rules of pairs (above)
##Filter out rows where tree1 and tree2 are the same
pairs <- pairs %>% filter(tree1 != tree2)
dim(pairs)
## [1] 650  17
##Filter to pairs we are going to consider
##regen-mature pairs and regen-regen pairs on same transect
pairs <- pairs %>% 
    filter((pair_type == "r-m") | 
           (pair_type == "r-r" &
              t1_pid == t2_pid))
dim(pairs)
## [1] 175  17
##Filter out cases where sp = 0
pairs <- pairs %>% 
  filter(sp>0)
dim(pairs)
## [1] 123  17
##Filter out trees in the same place (dist = 0)
##Use this object to calculate interception. Trees in same place have no 
##interception by default and the spatial function below would throw up 
##errors if they were present. But these pairs are relevant and contribute 
##a lot to seed load (a tree forked with another is a major infection source). 
##So save the larger object with the dist_m=0 trees to join later. 
pairs_icpt <- pairs %>% filter(dist_m > 0)
dim(pairs_icpt)
## [1] 123  17
#Check how many pairs are left at this point
dim(pairs_icpt) #123
## [1] 123  17
#Graph the test set again so we know what we are looking at
tmap_mode("plot") + tm_shape(test) + 
  tm_symbols(col = "crown_class_2", shape = "tree_type") + tm_grid()
## tmap mode set to plotting

tmap_mode("plot") + tm_shape(test) + 
  tm_symbols(col = "dmr") + tm_grid()
## tmap mode set to plotting

tmap_mode("plot") + tm_shape(test) + 
  tm_symbols(col = "sp") + tm_grid()
## tmap mode set to plotting

#Now we need to incorporate interception
#Each row in the dataset contains a unique pair
#In these, (2,1) and (1,2) are considered unique, even though these are the 
#same two trees. This is because directionality matters. In (2, 1) we are 
#modelling seed moving from tree 1 to tree 2; in (1, 2) we are modelling seed #moving from tree 2 to tree 1. 
#Because of the filtering above, for regen-mature tree pairs, only one case
#will be represented in the data (the case where tree 1 (target) is regen and 
#tree 2 (source) is a mature tree). But for regen-regen pairs, there will be
#two cases.
#We are going to define potential interception trees based on the crown class 
#of the source tree, so directionality matters for the interception component 
#too.
#Interception is calculated within a component (see rules above). So separate
#out two sets of pairs [regen-mature] and [regen-regen] at outset
pair_lines_rm <- pairs_icpt %>% 
  filter(t1_tt == "regen" & t2_tt == "mature")
pair_lines_rr <- pairs_icpt %>% 
  filter(t1_tt == "regen" & t2_tt == "regen")

#Draw lines between each unique pair
#Subset dataframes of pairs to the index columns that relate each tree in a 
#pair back to the site level dataframe
pair_lines_rm <- pair_lines_rm %>% select(t1_row, t2_row)
pair_lines_rr <- pair_lines_rr %>% select(t1_row, t2_row)

#Now use theses dataframes to call points from hw object to connect with a 
#line
pair_lines_rm <- pair_lines_rm %>% 
  rowwise() %>% 
  mutate(geometry = 
           st_union(hw$geometry[t1_row,], hw$geometry[t2_row,]) %>% 
           st_cast("LINESTRING")) %>% 
  ungroup() %>% 
  st_as_sf()
pair_lines_rr <- pair_lines_rr %>% 
  rowwise() %>% 
  mutate(geometry = 
           st_union(hw$geometry[t1_row,], hw$geometry[t2_row,]) %>% 
           st_cast("LINESTRING")) %>% 
  ungroup() %>% 
  st_as_sf()

#Check the geometries all came out valid
#Should be empty (0 rows)
pair_lines_rm[!st_is_valid(pair_lines_rm), ]
## Simple feature collection with 0 features and 2 fields
## Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
## Projected CRS: NAD83 / BC Albers
## # A tibble: 0 × 3
## # ℹ 3 variables: t1_row <int>, t2_row <int>, geometry <GEOMETRY [m]>
pair_lines_rr[!st_is_valid(pair_lines_rr), ]
## Simple feature collection with 0 features and 2 fields
## Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
## Projected CRS: NAD83 / BC Albers
## # A tibble: 0 × 3
## # ℹ 3 variables: t1_row <int>, t2_row <int>, geometry <GEOMETRY [m]>
#Visualize this again
tmap_mode("plot") + 
  tm_shape(pair_lines_rm) + tm_lines(col = "darkblue") +
  tm_shape(pair_lines_rr) + tm_lines(col = "lightblue") +
  tm_shape(test) + tm_symbols(shape = "tree_type", col = "status") + tm_grid()
## tmap mode set to plotting

#Now buffer these lines by 5m
pair_fp_rm <- pair_lines_rm %>% 
  st_buffer(dist = 2.5, endCapStyle = "FLAT")
pair_fp_rr <- pair_lines_rr %>% 
  st_buffer(dist = 2.5, endCapStyle = "FLAT")

#Visualize this again, but just for a few paths from one tree
#Can see that there are some intercepting mature trees on this path
tmap_mode("plot") + 
  tm_shape(test) + tm_symbols(col = "tree_type") + tm_grid() +
  tm_shape(pair_fp_rm[1:3,]) + tm_polygons(alpha = 0.2, col = "darkblue") + 
  tm_shape(pair_fp_rr[1:3,]) + tm_polygons(alpha = 0.2, col = "lightblue") 
## tmap mode set to plotting

#Add tree_ids to the footprints
x <- pairs_icpt %>% 
  select(t1_row, t2_row, tree1, tree2)
pair_fp_rm <- left_join(pair_fp_rm, x, by = c("t1_row", "t2_row"))
pair_fp_rr <- left_join(pair_fp_rr, x, by = c("t1_row", "t2_row"))

#With these polygons, representing the path between the crowns of two trees, 
#we can calculate interception
#For [regen-mature] pairs:
icpt_rm <- pair_fp_rm %>% 
  rowwise() %>% 
  mutate(
    #Intersect each polygon (buffered line) with the points in the site
    #level dataframe. Stored as a list.
    #Filter out the source tree (it shouldn't intercept seeds coming from 
    #itself) and to mature trees (because interception only considered within
    #a component)
    intersect_indices = list(which(test$tree_id != tree2 & 
                                     test$tree_type == "mature" &
                                     st_intersects(test, geometry, 
                                                  sparse = FALSE))),
    
    #Use the indices to subset the site level dataframe to the trees that
    #intersect. Also stored as a list. 
    #slice() subsets dataframe based on row indicies
    intersect_trees = list(test %>% slice(intersect_indices)),
    
    #Get counts of number of trees in different categories 
    #All stems
    icpt_all = nrow(intersect_trees),
    
    #All live standing stems
    icpt_ls = intersect_trees %>%
      filter(status %in% c("LS", "LF", "LL")) %>% 
      nrow(),
    
    #All livestanding trees by hw/non-hw and canopy class
    icpt_hw_C = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "C") %>% 
      nrow(),
    icpt_hw_I = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "I") %>% 
      nrow(),
    icpt_hw_S = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "S") %>% 
      nrow(),
    icpt_nh_C = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "C") %>% 
      nrow(),
    icpt_nh_I = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "I") %>% 
      nrow(),
    icpt_nh_S = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "S") %>% 
      nrow())

##################
#For [regen-regen] pairs:
icpt_rr <- pair_fp_rr %>% 
  rowwise() %>% 
  mutate(
    #Intersect each polygon (buffered line) with the points in the site
    #level dataframe. Stored as a list.
    #Filter out both trees in a pair (the goal is to identify intervening trees 
    #between the pair) and to regen treees (because interception only considered 
    #within a component)
    intersect_indices = list(which(test$tree_id != tree2 &
                                     test$tree_id != tree1 &
                                     test$tree_type == "regen" & 
                                     st_intersects(test, geometry, 
                                                  sparse = FALSE))),
    
    #Use the indices to subset the site level dataframe to the trees that
    #intersect. Also stored as a list. 
    #slice() subsets dataframe based on row indices
    intersect_trees = list(test %>% slice(intersect_indices)),
    
    #Get counts of number of trees in different categories 
    #All stems
    icpt_all = nrow(intersect_trees),
    
    #All live standing stems
    icpt_ls = intersect_trees %>%
      filter(status %in% c("LS", "LF", "LL")) %>% 
      nrow(),
    
    #All livestanding trees by hw/non-hw and canopy class
    icpt_hw_C = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "C") %>% 
      nrow(),
    icpt_hw_I = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "I") %>% 
      nrow(),
    icpt_hw_S = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp == "Hw" & 
               crown_class_2 == "S") %>% 
      nrow(),
    icpt_nh_C = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "C") %>% 
      nrow(),
    icpt_nh_I = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "I") %>% 
      nrow(),
    icpt_nh_S = intersect_trees %>% 
      filter(status %in% c("LS", "LF", "LL") &
               spp != "Hw" & 
               crown_class_2 == "S") %>% 
      nrow())

#Plot these to ensure this is working correctly.
##Select a representative case
##3 livestanding trees on path, 1 C Hw, 1 S Hw and 1 I non-Hw
icpt_rm[39,] %>% select(starts_with("icpt"))
## Simple feature collection with 1 feature and 8 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 1264035 ymin: 476885.7 xmax: 1264049 ymax: 476892.2
## Projected CRS: NAD83 / BC Albers
## # A tibble: 1 × 9
## # Rowwise: 
##   icpt_all icpt_ls icpt_hw_C icpt_hw_I icpt_hw_S icpt_nh_C icpt_nh_I icpt_nh_S
##      <int>   <int>     <int>     <int>     <int>     <int>     <int>     <int>
## 1        3       3         1         0         1         0         1         0
## # ℹ 1 more variable: geometry <POLYGON [m]>
##First, plot a path with just the intervening trees:
tmap_mode("plot") + 
  tm_shape(pair_fp_rm[39,]) + 
    tm_polygons(alpha = 0.2, col = "darkblue") +
  tm_shape(pair_lines_rm[39, ]) + 
    tm_lines() +
  tm_shape(test %>% slice(icpt_rm$intersect_indices[[39]])) + 
    tm_symbols(col = "spp", shape = "crown_class_2") + 
  tm_grid() 
## tmap mode set to plotting

##Same plot but adding the pair that created the path
tmap_mode("plot") + 
  #Path between pair (5m wide polygon)
  tm_shape(pair_fp_rm[39,]) + 
    tm_polygons(alpha = 0.2, col = "darkblue") +
  #Line connecting pair
  tm_shape(pair_lines_rm[39, ]) + 
    tm_lines() +
  #Intervening tree points
  tm_shape(test %>% slice(icpt_rm$intersect_indices[[39]])) + 
    tm_symbols(col = "spp", shape = "crown_class_2") + 
    tm_grid() +
  #Pair points
  tm_shape(hw %>% slice(c(icpt_rm$t1_row[39], icpt_rm$t2_row[39]))) +
    tm_symbols(col="black")
## tmap mode set to plotting

##################
#Now use tree counts to calculate interception factors
#First, join tree type and crown class class to table
x <- pairs %>% select(t1_row, t2_row, t1_tt, t2_tt, t1_spp, t2_spp,
                               t1_cc, t2_cc)
icpt_rm <- left_join(icpt_rm, x, by = c("t1_row", "t2_row"))
icpt_rr <- left_join(icpt_rr, x, by = c("t1_row", "t2_row"))

#Calculate total interception. See rules and factors at the start of section
#for reference
icpt_f1 <- 0.9
icpt_f2 <- 0.45
#For [regen-mature pairs]
icpt_rm <- icpt_rm %>% 
  mutate(
    icpt_tot = case_when(
      t2_cc == "C" ~ 
        (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f1 +
        (icpt_hw_S + icpt_nh_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_hw_I + icpt_nh_I)*icpt_f1 + (icpt_hw_S + icpt_nh_S)*icpt_f1,
      .default = 0),
    icpt_hw = case_when(
      t2_cc == "C" ~ 
        (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f1,
      .default = 0),
    icpt_nh = case_when(
      t2_cc == "C" ~ 
        (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f1,
      .default = 0)
    )

#For [regen-regen] pairs
icpt_rr <- icpt_rr %>% 
  mutate(
    icpt_tot = case_when(
      t2_cc == "C" ~ 
        (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f1 +
        (icpt_hw_S + icpt_nh_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_hw_C + icpt_nh_C)*icpt_f1+ (icpt_hw_I + icpt_nh_I)*icpt_f1 + 
        (icpt_hw_S + icpt_nh_S)*icpt_f1,
      .default = 0),
    icpt_hw = case_when(
      t2_cc == "C" ~ 
        (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f1,
      .default = 0),
    icpt_nh = case_when(
      t2_cc == "C" ~ 
        (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f2,
      t2_cc == "I" ~ 
        (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f2,
      t2_cc == "S" ~ 
        (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f1,
      .default = 0)
    )

#Get a summary of interception
icpt_rm %>% select(icpt_tot, icpt_hw, icpt_nh) %>% 
  summary()
##     icpt_tot       icpt_hw          icpt_nh                 geometry  
##  Min.   :0.00   Min.   :0.0000   Min.   :0.00000   POLYGON      :105  
##  1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.00000   epsg:3005    :  0  
##  Median :0.00   Median :0.0000   Median :0.00000   +proj=aea ...:  0  
##  Mean   :0.84   Mean   :0.8271   Mean   :0.01286                      
##  3rd Qu.:0.90   3rd Qu.:0.9000   3rd Qu.:0.00000                      
##  Max.   :3.60   Max.   :3.6000   Max.   :0.45000
icpt_rm %>% select(icpt_tot, icpt_hw, icpt_nh) %>% 
  summary()
##     icpt_tot       icpt_hw          icpt_nh                 geometry  
##  Min.   :0.00   Min.   :0.0000   Min.   :0.00000   POLYGON      :105  
##  1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.00000   epsg:3005    :  0  
##  Median :0.00   Median :0.0000   Median :0.00000   +proj=aea ...:  0  
##  Mean   :0.84   Mean   :0.8271   Mean   :0.01286                      
##  3rd Qu.:0.90   3rd Qu.:0.9000   3rd Qu.:0.00000                      
##  Max.   :3.60   Max.   :3.6000   Max.   :0.45000
#Now join the dataframes back together
icpt_all <- rbind(icpt_rm, icpt_rr)

#Also join the connecting lines back together for visualizations
pair_lines <- rbind(pair_lines_rm, pair_lines_rr)

#icpt_xx objects are polygons of paths between trees that have 
#interception attributes attached. Don't need to keep the polygons for 
#the final output so drop those 
icpt_all <- icpt_all %>% st_drop_geometry()

#Join interception estimates back to dataframe of pairs
x <- icpt_all %>% select(t1_row, t2_row, icpt_tot, icpt_hw, 
                           icpt_nh)
pairs <- left_join(pairs, x, by = c("t1_row", "t2_row"))

#Set interception equal to 0 for pairs in the same place
#There aren't any of these cases in the test dataframe but there are in the 
#larger dataset
pairs <- pairs %>% 
    mutate(across(c("icpt_tot", "icpt_hw", "icpt_nh"), 
                  ~if_else(dist_m == 0, 0, .)))

#Join interception estimates to paired lines
x <- pairs %>% select(t1_row, t2_row, pair_type, icpt_tot, icpt_hw, 
                           icpt_nh)
pair_lines <- left_join(pair_lines, x, by = c("t1_row", "t2_row"))

#Plot interception to get a sense of its distribution:
ggplot(pairs, aes(x = pair_type, y = icpt_tot)) + 
  geom_jitter(width = 0.05) + geom_violin(fill = NA)

Still working with the test area, combine dispersal function, seed production and interception to calculate seed load

#Seed dispersal function (defined above) operates over two intervals (1) within 
#the crown of a tree and (2) beyond it. In first interval, a constant factor is 
#applied to the seed production value of the source tree to estimate the 
#proportion of seed hitting the target tree. Beyond the crown, an exponential
#decay function is used to estimate the proportion of seed. 

#Add crown width of the source tree
x <- trees %>% select(tree_id, LCW)
pairs <- left_join(pairs, x, by = join_by(tree2 == tree_id))
summary(pairs$LCW)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.018   3.690   8.741   6.331   8.741   8.741
#Define new distance variable: distance zeroed on the edge of the crown
pairs <- pairs %>% 
  mutate(dist_m_ed = dist_m - (LCW/2))

#Check there are no NAs in sp
any(is.na(pairs$sp))
## [1] FALSE
#Then use dispersal function to estimate proportion of sp hitting a given tree 
pairs <- pairs %>% rowwise %>% 
  mutate(p_seed = f_p_seed(dist_m_ed))

#Check there are no NAs
any(is.na(pairs$p_seed))
## [1] FALSE
#Plot proportion of seed vs. distance between pairs
#Coloured by crown class of source tree because they have bigger crowns and 
#should be able to reach trees farther away
ggplot(pairs, aes(x = dist_m, y = p_seed, shape = t2_tt, 
                  colour = t2_cc)) + 
  geom_point()

#Also just plot simple violin plot
ggplot(pairs, aes(x = t2_tt, y = p_seed)) + 
  geom_point() +
  geom_violin(fill = NA)

#Now calculate seed load assuming no interception (just accounting for 
#dispersal)
pairs <- pairs %>% 
  mutate(sl_ni = sp*p_seed)

#Plot
ggplot(pairs, aes(x = dist_m, y = sl_ni, shape = t2_tt)) + 
  geom_point()

#Now calculate sl with interception
pairs <- pairs %>% 
  mutate(sl_i_tot = sl_ni - (sl_ni*icpt_tot),
         sl_i_hw = sl_ni - (sl_ni*icpt_hw),
         sl_i_nh = sl_ni - (sl_ni*icpt_nh)) %>% 
  mutate(sl_i_tot = if_else(sl_i_tot < 0, 0, sl_i_tot),
         sl_i_hw = if_else(sl_i_hw < 0, 0, sl_i_hw),
         sl_i_nh = if_else(sl_i_nh < 0, 0, sl_i_nh))

#Check there are no NAs in the sl columns
any(is.na(pairs$sl_ni))
## [1] FALSE
any(is.na(pairs$sl_i_tot))
## [1] FALSE
any(is.na(pairs$sl_i_hw))
## [1] FALSE
any(is.na(pairs$sl_i_nh))
## [1] FALSE
#Summarise data to get the cumulative seed load for each target tree
##For regen-mature pairs
sl_target_rm <- pairs %>% 
  filter(pair_type == "r-m") %>% 
  group_by(tree1) %>% 
  summarise(sl_ni_rm = sum(sl_ni),
            sl_i_tot_rm = sum(sl_i_tot),
            sl_i_hw_rm = sum(sl_i_hw),
            sl_i_nh_rm = sum(sl_i_nh))

##For regen-regen pairs
sl_target_rr <- pairs %>% 
  filter(pair_type == "r-r") %>% 
  group_by(tree1) %>% 
  summarise(sl_ni_rr = sum(sl_ni),
            sl_i_tot_rr = sum(sl_i_tot),
            sl_i_hw_rr = sum(sl_i_hw),
            sl_i_nh_rr = sum(sl_i_nh))

#Add these values back to the tree level dataframe:
test <- left_join(test, sl_target_rm, by = join_by(tree_id == tree1))
test <- left_join(test, sl_target_rr, by = join_by(tree_id == tree1))

#Gather seed load columns in the pairs dataset and the test dataset to do 
#some plotting
g1 <- pairs %>% 
  pivot_longer(cols = starts_with("sl_"),
               names_to = "sl_ver",
               names_prefix = "sl_",
               values_to = "sl") %>% 
  mutate(sl_source = case_match(
    pair_type,
    "r-r" ~ "regen",
    "r-m" ~ "mature"
  ))

g2 <- test %>% 
  pivot_longer(cols = starts_with("sl_"),
               names_to = c("sl_ver", "sl_source"),
               names_pattern = "sl_(.*)_(rm|rr)",
               values_to = "sl") %>% 
  mutate(sl_source = case_when(
    sl_source == "rr" ~ "regen",
    sl_source == "rm" ~ "mature"
  ))

#Visualize seed load for a given target tree
##Regen tree at centre of plot is r3
##Create a lines object that has the seed load values
#Layer this on top of a point layer that is coloured by cumulative seed load
##Facet by the different versions of seed load (no interception, hemlock 
##interception, non-hw interception and total interception)
x <- g1 %>% select(t1_row, t2_row, tree1, tree2, sl, sl_ver, sl_source)
x1 <- pair_lines %>% select(t1_row, t2_row)
pair_lines_sl <- left_join(x, x1, by = c("t1_row", "t2_row")) 
pair_lines_sl <- st_as_sf(pair_lines_sl)

#Plot looking at seed load from mature component first:
tmap_mode("plot") + 
  tm_shape((g2 %>% filter(sl_source == "mature"))) + 
    tm_symbols(col = "sl") + tm_facets(by = "sl_ver") +
  tm_shape(filter(pair_lines_sl, tree1 == "r3" & sl_source == "mature")) + 
    tm_lines(lwd = "sl", scale = 5) +  
  tm_grid() +
  tm_facets(by = "sl_ver")
## tmap mode set to plotting

#Then looking at regen component:
tmap_mode("plot") + 
  tm_shape((g2 %>% filter(sl_source == "regen"))) + 
    tm_symbols(col = "sl") + tm_facets(by = "sl_ver") +
  tm_shape(filter(pair_lines_sl, tree1 == "r3" & sl_source == "regen")) + 
    tm_lines(lwd = "sl", scale = 5) +  
  tm_grid() +
  tm_facets(by = "sl_ver")
## tmap mode set to plotting

#Plot a violin plot looking at seed load by with and without interception
#originating from mature and regen sources
##Note: axis scales are different
ggplot(g2, aes(x = sl_ver, y = sl)) +
  geom_point() +
  geom_violin(fill=NA)+
  facet_wrap(~sl_source, scales = "free")
## Warning: Removed 272 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
## Warning: Removed 272 rows containing missing values or values outside the scale range
## (`geom_point()`).

#How much is seed load reduced by interception? 
#Calculate ratio between seed load with and without interception
#Answer: some, seed load accounting for interception is ~ 47% of
#initial sl value
pairs <- pairs %>% 
  mutate(icpt_frac = sl_i_tot/sl_ni) 
pairs %>% select(icpt_frac) %>% summary()
##    icpt_frac     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.1000  
##  Mean   :0.4699  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

Now recreate the example above but applied across all trees at all sites. Start by defining a function for interception:

#Input requirements:
##sf: sf dataframe of trees at a site, with point geometry
##buffer: scalar vector: how wide should the path between a pair of trees be?
##all trees stems that intersect this path are considered as interception trees.
##icpt_f1: interception factor 1, for trees in the same canopy class (or 
##considered to have foliage throughout the same vertical stratum) as the 
##source tree
##icpt_f2: interception factor 2, for trees in a lower canopy class than the 
##source tree, that only restrict some of the vertical space on a path
f_icpt <- function(sf, buffer, icpt_f1, icpt_f2) {
  #tree1 = target tree
  #tree2 = source tree
  
  #Save site_id of the df
  site_id <- as.character(sf$site_id[1])
  
  #Filter dataframe to live Hw and Ba trees
  hw <- sf %>% filter(spp == "Hw" &
                           status %in% c("LS", "LF", "LL"))
  dim(hw) #Number of pairs is number of rows^2
  
  #Save row number as a variable
  hw$row <- as.integer(row.names(hw))
  
  #Calculate distance between each unique pair in this set
  dist_matrix <- st_distance(hw, by_element = FALSE)
  dim(dist_matrix)
  
  #Get indices of pairs within 25m. Then use these indices to create a df of 
  #pairs
  indices <- which(dist_matrix < units::set_units(25, "m"), arr.ind = TRUE)
  pairs <- as.data.frame(indices)
  pairs$dist_m <- as.numeric(dist_matrix[indices])
  dim(pairs) 
  
  #Rename row and col in the pairs df. These correspond to the row 
  #number of tree1 and tree2 in the site level dataframe respectively
  pairs <- pairs %>% 
    rename(t1_row = row, t2_row = col)
  
  #At this point, the dataframe has a row for each pair, with the distance
  #value
  #Add tree_id, tree type, status, species, crown class and plot_id 
  #(=transect_id) of tree one and two
  #Also add seed production for source tree (tree2)
  x <- hw %>% st_drop_geometry()
  pairs <- 
    left_join(pairs,
              select(x, row, tree_id, plot_id, tree_type, status, spp,
                     crown_class_2),
              by = join_by(t1_row == row)) %>% 
    rename(tree1 = tree_id, t1_pid = plot_id, t1_st = status, t1_spp = spp, 
           t1_tt = tree_type, t1_cc = crown_class_2)
  pairs <- 
    left_join(pairs,
              select(x, row, tree_id, plot_id, tree_type, status, spp, 
                     crown_class_2, sp),
              by = join_by(t2_row == row)) %>% 
    rename(tree2 = tree_id, t2_pid = plot_id, t2_st = status,t2_spp = spp, 
           t2_tt = tree_type, t2_cc = crown_class_2)
  
  #Add a variable that identifies pair type
  pairs <- pairs %>% 
    mutate(pair_type = case_when(
      (t1_tt == "regen" & t2_tt == "regen") ~ "r-r",
      (t1_tt == "regen" & t2_tt == "mature") ~ "r-m",
      (t1_tt == "mature" & t2_tt == "mature") ~ "m-m",
    ))
  
  #Filter based on the rules of pairs
  ##Filter out rows where tree1 and tree2 are the same
  pairs <- pairs %>% filter(tree1 != tree2)
  dim(pairs)
  
  ##Filter to pairs we are going to consider
  ##regen-mature pairs and regen-regen pairs on same transect
  pairs <- 
    pairs %>% 
    filter((pair_type == "r-m") | 
           (pair_type == "r-r" &
              t1_pid == t2_pid))
  dim(pairs)
  
  ##Filter out cases where sp = 0
  pairs <- pairs %>%
    filter(sp > 0)
  dim(pairs)
  
  ##Filter out trees in the same place (dist = 0)
  ##Use this object to calculate interception. Trees in same place have no 
  ##interception by default and the spatial function below would throw up 
  ##errors if they were present. But these pairs are relevant and contribute 
  ##a lot to seed load (a tree forked with another is a major infection source). 
  ##So save the larger object with the dist_m=0 trees to join later. 
  pairs_icpt <- pairs %>% filter(dist_m > 0)
  dim(pairs_icpt)
  
  #Interception is calculated within a component (see rules above). So separate
  #out two sets of pairs [regen-mature] and [regen-regen] at outset.
  #The workflow for each component is nearly identical but there are small 
  #differences. They are kept separate from here until just before the end of
  #the script. 
  pairs_rm <- pairs_icpt %>% 
    filter(t1_tt == "regen" & t2_tt == "mature")
  pairs_rr <- pairs_icpt %>% 
    filter(t1_tt == "regen" & t2_tt == "regen")
  
  #Check if these objects have pairs in them. Use if statements to lead into
  #the next function (or not if they are empty). 
  
  #For [regen-mature] pairs
  if(nrow(pairs_rm) > 0) {
    #Draw lines between each unique pair
    ##Subset dataframes of pairs to the index columns that relate each tree 
    ##in a pair back to the site level dataframe
    ##Also include tree_ids
    pair_lines_rm <- pairs_rm %>% select(tree1, tree2, t1_row, t2_row)
    
    ##Now use these dataframes to call points from original pairs object 
    ##(hw) to connect with a line
    pair_lines_rm <- pair_lines_rm %>% 
      rowwise() %>% 
      mutate(geometry = 
               st_union(hw$geometry[t1_row,], hw$geometry[t2_row,]) %>% 
               st_cast("LINESTRING")) %>% 
      ungroup() %>% 
      st_as_sf()
    
    ##Add a check to ensure geometries are valid
    if(any(!st_is_valid(pair_lines_rm))) {
      stop("Some regen-mature pair lines have invalid geometry")
    }
    
    #Buffer lines by buffer distance, specified in function
    #fp = footprint
    pair_fp_rm <- pair_lines_rm %>% 
      st_buffer(dist = buffer, endCapStyle = "FLAT")
    
    #Now calculate number of trees on each path
    icpt_rm <- pair_fp_rm %>% 
      rowwise() %>% 
      mutate(
        #Intersect each polygon (buffered line) with the points in the site
        #level dataframe. Stored as a list
        intersect_indices = list(which(sf$tree_id != tree2 & 
                                     sf$tree_type == "mature" &
                                     st_intersects(sf, geometry, 
                                                  sparse = FALSE))),
        
        #Use the indices to subset the site level dataframe to the trees that
        #intersect. Also stored as a list. 
        #slice() subsets dataframe based on row indicies
        intersect_trees = list(sf %>% slice(intersect_indices)),
        
        #Get counts of number of trees in different categories 
        #All stems
        icpt_all = nrow(intersect_trees),
        
        #All live standing stems
        icpt_ls = intersect_trees %>%
          filter(status %in% c("LS", "LF", "LL")) %>% 
          nrow(),
        
        #All livestanding trees by hw/non-hw and canopy class
        icpt_hw_C = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "C") %>% 
          nrow(),
        icpt_hw_I = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "I") %>% 
          nrow(),
        icpt_hw_S = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "S") %>% 
          nrow(),
        icpt_nh_C = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "C") %>% 
          nrow(),
        icpt_nh_I = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "I") %>% 
          nrow(),
        icpt_nh_S = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "S") %>% 
          nrow())
    
    #Use tree counts to calculate interception factors
      ##First, join tree type and crown class class to table
      x <- pairs %>% select(t1_row, t2_row, t1_tt, t2_tt, t1_spp, t2_spp,
                                     t1_cc, t2_cc)
      icpt_rm <- left_join(icpt_rm, x, by = c("t1_row", "t2_row"))
      
      ##Calculate total interception. See rules and factors at of section for 
      ##reference
      ##For [regen-mature pairs]:
      icpt_rm <- icpt_rm %>% 
        mutate(
          icpt_tot = case_when(
            t2_cc == "C" ~ 
              (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f1 +
              (icpt_hw_S + icpt_nh_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_hw_I + icpt_nh_I)*icpt_f1 + (icpt_hw_S + icpt_nh_S)*icpt_f1,
            .default = 0),
          icpt_hw = case_when(
            t2_cc == "C" ~ 
              (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f1,
            .default = 0),
          icpt_nh = case_when(
            t2_cc == "C" ~ 
              (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f1,
            .default = 0)
          )
  }
  
  #For [regen-regen] pairs
  if(nrow(pairs_rr) > 0) {
    #Draw lines between each unique pair
    ##Subset dataframes of pairs to the index columns that relate each tree 
    ##in a pair back to the site level dataframe
    #Also include tree ids
    pair_lines_rr <- pairs_rr %>% select(tree1, tree2, t1_row, t2_row)
    
    ##Now use these dataframes to call points from original pairs object (hw) 
    ##to connect with a line
    pair_lines_rr <- pair_lines_rr %>% 
      rowwise() %>% 
      mutate(geometry = 
               st_union(hw$geometry[t1_row,], hw$geometry[t2_row,]) %>% 
               st_cast("LINESTRING")) %>% 
      ungroup() %>% 
      st_as_sf()
    
    ##Add a check to ensure geometries are valid
    if(any(!st_is_valid(pair_lines_rr))) {
      stop("Some regen-regen pair lines have invalid geometry")
    }
    
    #Buffer lines by buffer distance, specified in function
    #fp = footprint
    pair_fp_rr <- pair_lines_rr %>% 
      st_buffer(dist = buffer, endCapStyle = "FLAT")
    
    #Now calculate number of trees on each path
    icpt_rr <- pair_fp_rr %>% 
      rowwise() %>% 
      mutate(
        #Intersect each polygon (buffered line) with the points in the site
        #level dataframe. Stored as a list
        intersect_indices = list(which(sf$tree_id != tree2 &
                                     sf$tree_id != tree1 &
                                     sf$tree_type == "regen" & 
                                     st_intersects(sf, geometry, 
                                                  sparse = FALSE))),
        
        #Use the indices to subset the site level dataframe to the trees that
        #intersect. Also stored as a list. 
        #slice() subsets dataframe based on row indicies
        intersect_trees = list(sf %>% slice(intersect_indices)),
        
        #Get counts of number of trees in different categories 
        #All stems
        icpt_all = nrow(intersect_trees),
        
        #All live standing stems
        icpt_ls = intersect_trees %>%
          filter(status %in% c("LS", "LF", "LL")) %>% 
          nrow(),
        
        #All livestanding trees by hw/non-hw and canopy class
        icpt_hw_C = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "C") %>% 
          nrow(),
        icpt_hw_I = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "I") %>% 
          nrow(),
        icpt_hw_S = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp == "Hw" & 
                   crown_class_2 == "S") %>% 
          nrow(),
        icpt_nh_C = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "C") %>% 
          nrow(),
        icpt_nh_I = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "I") %>% 
          nrow(),
        icpt_nh_S = intersect_trees %>% 
          filter(status %in% c("LS", "LF", "LL") &
                   spp != "Hw" & 
                   crown_class_2 == "S") %>% 
          nrow())
    
    #Use tree counts to calculate interception factors
      ##First, join tree type and crown class class to table
      x <- pairs %>% select(t1_row, t2_row, t1_tt, t2_tt, t1_spp, t2_spp,
                                     t1_cc, t2_cc)
      icpt_rr <- left_join(icpt_rr, x, by = c("t1_row", "t2_row"))
      
      ##Calculate total interception. See rules and factors at of section for 
      ##reference
      icpt_rr <- icpt_rr %>% 
        mutate(
          icpt_tot = case_when(
            t2_cc == "C" ~ 
              (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_hw_C + icpt_nh_C)*icpt_f1 + (icpt_hw_I + icpt_nh_I)*icpt_f1 +
              (icpt_hw_S + icpt_nh_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_hw_C + icpt_nh_C)*icpt_f1+ (icpt_hw_I + icpt_nh_I)*icpt_f1 + 
              (icpt_hw_S + icpt_nh_S)*icpt_f1,
            .default = 0),
          icpt_hw = case_when(
            t2_cc == "C" ~ 
              (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_hw_C)*icpt_f1 + (icpt_hw_I)*icpt_f1 + (icpt_hw_S)*icpt_f1,
            .default = 0),
          icpt_nh = case_when(
            t2_cc == "C" ~ 
              (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f2,
            t2_cc == "I" ~ 
              (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f2,
            t2_cc == "S" ~ 
              (icpt_nh_C)*icpt_f1 + (icpt_nh_I)*icpt_f1 + (icpt_nh_S)*icpt_f1,
            .default = 0)
          )
  }
  #Join objects separated into regen-mature and regen-regen pairs back 
  #together
  ##Interception polygons:
  if (nrow(pairs_rm) > 0 & nrow(pairs_rr) > 0) {
    icpt_all <- rbind(icpt_rm, icpt_rr)
  } else {
    icpt_all <- icpt_rm
  }
  ##Lines between pairs:
  if (nrow(pairs_rm) > 0 & nrow(pairs_rr) > 0) {
    pair_lines <- rbind(pair_lines_rm, pair_lines_rr)
  } else {
    pair_lines <- pair_lines_rm
  } 
  
  #icpt_xx objects are polygons of paths between trees that have 
  #interception attributes attached. Don't need to keep the polygons for 
  #the final output so drop those
  icpt_all <- icpt_all %>% st_drop_geometry()
  
  #Join interception estimates back to dataframe of pairs
  x <- icpt_all %>% select(t1_row, t2_row, icpt_tot, icpt_hw, 
                           icpt_nh)
  pairs <- left_join(pairs, x, by = c("t1_row", "t2_row"))
  
  #Set interception equal to 0 for pairs in the same place
  pairs <- pairs %>% 
    mutate(across(c("icpt_tot", "icpt_hw", "icpt_nh"), 
                  ~if_else(dist_m == 0, 0, .)))

  #Join interception estimates to paired lines
  x <- pairs %>% select(t1_row, t2_row, pair_type, icpt_tot, icpt_hw, 
                           icpt_nh)
  pair_lines <- left_join(pair_lines, x, by = c("t1_row", "t2_row"))
  
  #Add site_id to both objects
  pairs <- pairs %>% mutate(site_id = site_id)
  pair_lines <- pair_lines %>% mutate(site_id = site_id)
  
  #FINAL OUTPUTS
  ##[1] "pairs"; dataframe of pairs of HDM hosts for a given site 
  ##with estimates of interception between them. 
  ##[2] "pair_lines"; sf object of lines between each pair with estimates
  ##of interception. Good for graphing. 
  return(list(pairs = pairs, pair_lines = pair_lines))    
}

Try applying the function over another test set of three sites

#Redefine trees_sf object in case the cut block above wasn't run
#Coordinates are called X and Y
#CRS BC Albers NAD83 = 3005
trees_sf <- st_as_sf(trees, coords = c("X", "Y"), crs = 3005)
#check CRS
st_crs(trees_sf)
## Coordinate Reference System:
##   User input: EPSG:3005 
##   wkt:
## PROJCRS["NAD83 / BC Albers",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["British Columbia Albers",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",45,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-126,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",50,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",58.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",1000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Province-wide spatial data management."],
##         AREA["Canada - British Columbia."],
##         BBOX[48.25,-139.04,60.01,-114.08]],
##     ID["EPSG",3005]]
#Filter the mature trees from the dataset that were beyond the 10m boundary used across all sites. 
trees_sf <- trees_sf %>% filter(outside_10 == "N" | is.na(outside_10))

#Subset to three sites, one from each cluster, then separate out each site
#Resulting object is a list of dataframes
test <- trees_sf %>% 
  filter(site_id %in% c("mi_1", "cr_1", "ph_1")) %>% 
  group_by(site_id) %>% 
  group_split()

#Set parameters for the function. Do this by redefining generic function
#above, but with buffer and interception factors set. 
#First parameter set: 
## buffer = 2.5m
## icpt_f1 = 0.9 (trees where this factor apply reduce seed load by 90%)
## icpt_f1 = 0.45 (trees where this factor apply reduce seed load by 45%)
f_icpt_set1 <- function(sf){
  f_icpt(sf = sf, buffer = 2.5, icpt_f1 = 0.9, icpt_f2 = 0.45)
}

#Apply the function to each site level dataframe automatically with map 
#function
#Output: list of two item lists. The higher level list is for each site. Within 
#each site, there is a list of the two outputs (see above) from the function. 
icpt <- map(test, f_icpt_set1)

#Extract the dataframes with interception estimates
pairs <- lapply(icpt, function(x) x[[1]])
pairs <- do.call(rbind, pairs)
##Reorder the columns
pairs <- pairs %>% select(site_id, pair_type, everything())

#Extract the paired lines
pair_lines <- lapply(icpt, function(x) x[[2]])
pair_lines <- do.call(rbind, pair_lines)
##Reorder the columns
pair_lines <- pair_lines %>% select(site_id, pair_type, everything())

#Plot these objects to make sure they worked
##First plot, points and connecting lines with no filter
##These are all the pairs created in the interception workflow --> all pairs 
##of live Hw trees where source tree has seed production > 0
##Lack of regen-regen pairs at ph_1 isn't a mistake. There was almost no 
##infection at that site
test_g <- trees_sf %>% 
  filter(site_id %in% c("mi_1", "cr_1", "ph_1"))
tmap_mode("plot") + 
   tm_shape(pair_lines) + 
    tm_lines(col="pair_type") +
  tm_shape(test_g) + 
    tm_symbols(col = "sp", scale = 0.5) + 
  tm_grid() +
    tm_facets(by = "site_id", ncol = 3, free.coords = T, free.scales = T)
## tmap mode set to plotting

##Second plot, same as above but filtered to remove lines where interception 
##removes 100% of seed load (icpt_tot>1)
tmap_mode("plot") + 
  tm_shape(pair_lines %>% filter(icpt_tot < 1)) + 
    tm_lines(col="pair_type") +
  tm_shape(test_g) + 
    tm_symbols(col = "dmr_f", scale = 0.5) + 
  tm_grid() +
    tm_facets(by = "site_id", ncol = 3, free.coords = T, free.scales = T)
## tmap mode set to plotting

##Third plot: violin plot of interception
icpt_comp <- pairs %>% pivot_longer(cols = starts_with("icpt"),
                                    names_to = "icpt_ver",
                                    names_prefix = "icpt_",
                                    values_to="icpt")
ggplot(icpt_comp, aes(x = site_id, y = icpt)) +
  geom_point() +
  geom_violin(fill=NA) + 
  facet_grid(pair_type~icpt_ver, scales = "free")

Scale up and apply function to entire HDM dataset

#Redefine trees_sf object in case the cut block above wasn't run
#Coordinates are called X and Y
#CRS BC Albers NAD83 = 3005
trees_sf <- st_as_sf(trees, coords = c("X", "Y"), crs = 3005)
#check CRS
st_crs(trees_sf)
## Coordinate Reference System:
##   User input: EPSG:3005 
##   wkt:
## PROJCRS["NAD83 / BC Albers",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["British Columbia Albers",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",45,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-126,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",50,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",58.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",1000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Province-wide spatial data management."],
##         AREA["Canada - British Columbia."],
##         BBOX[48.25,-139.04,60.01,-114.08]],
##     ID["EPSG",3005]]
#Filter the mature trees from the dataset that were beyond the 10m boundary used across all sites. 
trees_sf <- trees_sf %>% filter(outside_10 == "N" | is.na(outside_10))

#Separate out each site. Resulting object is a list of dataframes
site_sf <- trees_sf %>% 
  group_by(site_id) %>% 
  group_split()

#Set parameters for the function. Do this by redefining generic function
#above, but with buffer and interception factors set. 
#First parameter set: 
## buffer = 2.5m
## icpt_f1 = 0.9 (trees where this factor apply reduce seed load by 90%)
## icpt_f1 = 0.45 (trees where this factor apply reduce seed load by 45%)
f_icpt_set1 <- function(sf){
  f_icpt(sf = sf, buffer = 2.5, icpt_f1 = 0.9, icpt_f2 = 0.45)
}

#Apply the function to each site level dataframe automatically with map 
#function
#Output: list of two item lists. The higher level list is for each site. Within 
#each site, there is a list of the two outputs (see above) from the function. 
icpt <- map(site_sf, f_icpt_set1)

#Extract the dataframes with interception estimates
pairs <- lapply(icpt, function(x) x[[1]])
pairs <- do.call(rbind, pairs)
##Reorder the columns
pairs <- pairs %>% select(site_id, pair_type, everything())

#Extract the paired lines
pair_lines <- lapply(icpt, function(x) x[[2]])
pair_lines <- do.call(rbind, pair_lines)
##Reorder the columns
pair_lines <- pair_lines %>% select(site_id, pair_type, everything())

#Plot these objects to make sure they worked
##First plot, points and connecting lines with no filter
##These are all the pairs created in the interception workflow --> all pairs 
##of live Hw trees where source tree has seed production > 0
##Lack of regen-regen pairs at ph_1 isn't a mistake. There was almost no 
##infection at that site
tmap_mode("plot") + 
   tm_shape(pair_lines) + 
    tm_lines(col="pair_type") +
  tm_shape(trees_sf) + 
    tm_symbols(col = "sp", scale = 0.5) + 
  tm_grid() +
    tm_facets(by = "site_id", ncol = 4, free.coords = T, free.scales = T)
## tmap mode set to plotting

#Put it all together At this point we have: - seed production estimates for each Hw tree - a seed dispersal function that estimates the proportion of a seed production value that reaches a given distance - a dataframe of pairs of live Hw trees that are within 25m of eachother and where the source tree has some amount of HDM - estimates of interception on the path between each pair

Next step is to combine these pieces to estimate the seed load reaching the target tree in each pair. Then sum the seed load values for each pair a target tree is involved in to get the total cumulative seed load for each regen tree.

Calculate seed load from regen source trees and seed load from mature source trees separately and then together so both can be used in analysis.

#Add crown width of the source tree to the pairs dataset
x <- trees %>% select(tree_id, LCW)
pairs <- left_join(pairs, x, by = join_by(tree2 == tree_id))
summary(pairs$LCW)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.974   3.143   4.203   4.644   6.101  10.979
#Define new distance variable: distance zeroed on the edge of the crown
#Crown width havled to get crown radius
pairs <- pairs %>% 
  mutate(dist_m_ed = dist_m - (LCW/2))

#Check there are no NAs in sp
any(is.na(pairs$sp))
## [1] FALSE
#Then use dispersal function to estimate proportion of sp hitting a given tree 
pairs <- pairs %>% rowwise %>% 
  mutate(p_seed = f_p_seed(dist_m_ed))

#Check there are no NAs
any(is.na(pairs$p_seed))
## [1] FALSE
#Plot proportion of seed vs. distance between pairs
##Coloured by crown class of source tree because they have bigger crowns and 
##should be able to reach trees farther away
##looks good
ggplot(pairs, aes(x = dist_m, y = p_seed, 
                  colour = t2_cc)) + 
  geom_point() +
  facet_wrap(~pair_type)

#Also just make simple violin plot
##Most regen-mature pairs far apart, low p_seed, relative to regen-regen pairs
ggplot(pairs, aes(x = t2_cc, y = p_seed)) + 
  geom_point() +
  geom_violin(fill = NA) + 
  facet_wrap(~pair_type)

#Now calculate seed load assuming no interception (just accounting for 
#dispersal)
pairs <- pairs %>% 
  mutate(sl_ni = sp*p_seed)

#Plot seed load vs. distance between trees
##Behaving as it should, co-dominant trees, have bigger sp value and reach 
##farther. Suppressed trees (on other end of the spectrum) have smaller sp 
##values and smaller crowns, so reach trees closer
ggplot(pairs, aes(x = dist_m, y = sl_ni, colour = t2_cc)) + 
  geom_point() + 
  facet_wrap(~pair_type, scales = "free_y")

#Now calculate sl with interception
pairs <- pairs %>% 
  mutate(sl_i_tot = sl_ni - (sl_ni*icpt_tot),
         sl_i_hw = sl_ni - (sl_ni*icpt_hw),
         sl_i_nh = sl_ni - (sl_ni*icpt_nh)) %>% 
  mutate(sl_i_tot = if_else(sl_i_tot < 0, 0, sl_i_tot),
         sl_i_hw = if_else(sl_i_hw < 0, 0, sl_i_hw),
         sl_i_nh = if_else(sl_i_nh < 0, 0, sl_i_nh))

#Check there are no NAs in the sl columns
any(is.na(pairs$sl_ni))
## [1] FALSE
any(is.na(pairs$sl_i_tot))
## [1] FALSE
any(is.na(pairs$sl_i_hw))
## [1] FALSE
any(is.na(pairs$sl_i_nh))
## [1] FALSE
#Gather seed load with and without interception into a single column to compare
sl_comp <- pairs %>% 
  pivot_longer(cols = starts_with("sl_"),
               names_to = "sl_ver",
               names_prefix = "sl_",
               values_to = "sl")

#Make a violin plot
##Looks good, interception lowers seed load relative to no interception 
##scenario (ni)
ggplot(sl_comp, aes(x = sl_ver, y = sl)) +
  geom_violin(fill=NA)+
  facet_wrap(~pair_type, scales = "free")

#Summarise data to get the cumulative seed load for each target tree
##For regen-mature pairs
sl_target_rm <- pairs %>% 
  filter(pair_type == "r-m") %>% 
  group_by(tree1) %>% 
  summarise(sl_ni_rm = sum(sl_ni),
            sl_i_tot_rm = sum(sl_i_tot),
            sl_i_hw_rm = sum(sl_i_hw),
            sl_i_nh_rm = sum(sl_i_nh))

##For regen-regen pairs
sl_target_rr <- pairs %>% 
  filter(pair_type == "r-r") %>% 
  group_by(tree1) %>% 
  summarise(sl_ni_rr = sum(sl_ni),
            sl_i_tot_rr = sum(sl_i_tot),
            sl_i_hw_rr = sum(sl_i_hw),
            sl_i_nh_rr = sum(sl_i_nh))

#Add these values back to the tree level dataframe:
trees <- left_join(trees, sl_target_rm, by = join_by(tree_id == tree1))
trees <- left_join(trees, sl_target_rr, by = join_by(tree_id == tree1))

#Get a summary of these new variables
##Lots of NAs here, some are valid, some aren't. 
##seed load should have a value for all live Hw regen trees. If they 
##are more than 25m from the edge and/or don't have any regen trees in their
##vicinity with HDM, they may be NA when they should be 0. Reassign these 
##cases
trees %>% select(starts_with("sl")) %>% summary()
##     sl_ni_rm          sl_i_tot_rm        sl_i_hw_rm        sl_i_nh_rm      
##  Min.   :   0.0201   Min.   :  0.000   Min.   :  0.000   Min.   :   0.018  
##  1st Qu.:   3.6219   1st Qu.:  1.411   1st Qu.:  1.483   1st Qu.:   3.168  
##  Median :  27.6053   Median : 12.614   Median : 13.862   Median :  25.703  
##  Mean   : 102.6591   Mean   : 46.119   Mean   : 49.594   Mean   :  96.001  
##  3rd Qu.: 133.7664   3rd Qu.: 60.471   3rd Qu.: 64.454   3rd Qu.: 121.064  
##  Max.   :1452.5697   Max.   :542.940   Max.   :546.712   Max.   :1451.011  
##  NA's   :2091        NA's   :2091      NA's   :2091      NA's   :2091      
##     sl_ni_rr        sl_i_tot_rr       sl_i_hw_rr       sl_i_nh_rr      
##  Min.   :  0.003   Min.   : 0.000   Min.   : 0.000   Min.   :  0.0000  
##  1st Qu.:  0.950   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:  0.0517  
##  Median :  8.542   Median : 0.000   Median : 0.474   Median :  4.3618  
##  Mean   : 19.315   Mean   : 4.951   Mean   : 6.212   Mean   : 13.7389  
##  3rd Qu.: 24.779   3rd Qu.: 5.271   3rd Qu.: 6.501   3rd Qu.: 17.2444  
##  Max.   :150.236   Max.   :85.747   Max.   :88.626   Max.   :142.9166  
##  NA's   :2063      NA's   :2063     NA's   :2063     NA's   :2063
#Create a temporary object with all the trees that should have values for sl
x <- trees %>% 
  filter(tree_type == "regen" &
           spp =="Hw" &
           status %in% c("LS", "LL", "LF"))
x %>% select(starts_with("sl")) %>% summary()
##     sl_ni_rm          sl_i_tot_rm        sl_i_hw_rm        sl_i_nh_rm      
##  Min.   :   0.0201   Min.   :  0.000   Min.   :  0.000   Min.   :   0.018  
##  1st Qu.:   3.6219   1st Qu.:  1.411   1st Qu.:  1.483   1st Qu.:   3.168  
##  Median :  27.6053   Median : 12.614   Median : 13.862   Median :  25.703  
##  Mean   : 102.6591   Mean   : 46.119   Mean   : 49.594   Mean   :  96.001  
##  3rd Qu.: 133.7664   3rd Qu.: 60.471   3rd Qu.: 64.454   3rd Qu.: 121.064  
##  Max.   :1452.5697   Max.   :542.940   Max.   :546.712   Max.   :1451.011  
##  NA's   :506         NA's   :506       NA's   :506       NA's   :506       
##     sl_ni_rr        sl_i_tot_rr       sl_i_hw_rr       sl_i_nh_rr      
##  Min.   :  0.003   Min.   : 0.000   Min.   : 0.000   Min.   :  0.0000  
##  1st Qu.:  0.950   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:  0.0517  
##  Median :  8.542   Median : 0.000   Median : 0.474   Median :  4.3618  
##  Mean   : 19.315   Mean   : 4.951   Mean   : 6.212   Mean   : 13.7389  
##  3rd Qu.: 24.779   3rd Qu.: 5.271   3rd Qu.: 6.501   3rd Qu.: 17.2444  
##  Max.   :150.236   Max.   :85.747   Max.   :88.626   Max.   :142.9166  
##  NA's   :478       NA's   :478      NA's   :478      NA's   :478
#Find out how many need to be reassigned
x %>% filter(is.na(sl_ni_rm) | is.na(sl_ni_rr)) %>% nrow()
## [1] 687
#Then do the reassignment: 
trees <- trees %>% 
  mutate(across(starts_with("sl") & ends_with("rm"), 
                  ~ if_else(tree_type == "regen" &
                              spp == "Hw" &
                              status %in% c("LS", "LL", "LF") &
                              is.na(sl_ni_rm), 0, .)),
         across(starts_with("sl") & ends_with("rr"), 
                  ~ if_else(tree_type == "regen" &
                              spp == "Hw" &
                              status %in% c("LS", "LL", "LF") &
                              is.na(sl_ni_rr), 0, .))
         )

#Check we got them all
x <- trees %>% 
  filter(tree_type == "regen" &
           spp =="Hw" &
           status %in% c("LS", "LL", "LF"))
x %>% filter(is.na(sl_ni_rm) | is.na(sl_ni_rr)) %>% nrow()
## [1] 0
#Calculate the total seed load (sum of regen and mature source trees)
trees <- trees %>% 
  mutate(sl_ni_all = sl_ni_rm + sl_ni_rr,
         sl_i_tot_all = sl_i_tot_rm + sl_i_tot_rr,
         sl_i_hw_all = sl_i_hw_rm + sl_i_hw_rr,
         sl_i_nh_all = sl_i_nh_rm + sl_i_nh_rr)

#Done. Export this version of the trees object
# write_csv(trees, here("./data/workflow/trees_sl.csv"))

#Exploratory plots relating seed load to regen tree infection

#Plot seed load from mature trees as function of distance from the edge:
x <- pivot_longer(trees, 
                  cols = starts_with("sl_i_tot"),
                  names_to = "sl_source",
                  names_prefix = "sl_i_tot",
                  values_to = "sl")
ggplot(x, aes(x = dist_y_h, y = sl, colour = sl_source)) + 
  geom_point()
## Warning: Removed 4755 rows containing missing values or values outside the scale range
## (`geom_point()`).

#Plot seed load of mature trees vs dmr
#Looks sort of promising, there is a slight right trend
x <- trees %>% 
  filter(tree_type == "regen" &
           spp == "Hw" &
           status %in% c("LS", "LL", "LF"))

ggplot(x, aes(x = sl_i_tot_rm, y = factor(dmr, levels = c(0,1,2,3,4,5,6),
                                          ordered = TRUE))) + 
  geom_boxplot() 

#Extra code Version of seed dispersal function fitted with the gamma function

#Goal is to create a function that models the portion of seed originating from
#a tree crown that is deposited a given distance away
#The seed dispersal curve is modeled with the gamma function. It relates the 
#proportion of seed dispersed to the distance from the tree
#It's parameters have been adjusted to match data from Smith (1966), who 
#measured seed deposition as a function of distance from a tree stem

#Chat GPTs description of how paramaeters affect the shape: 
#Alpha = shape parameter
##When alpha > 1, the distribution has a positive skew, starting from zero, 
##increasing to a peak, and then decreasing.
##When alpha = 1, the gamma distribution reduces to the exponential 
#distribution.
##When alpha < 1, the gamma distribution has a heavier tail, meaning higher 
##probabilities for small values of d.
##Generally, increasing alpha shifts the mode (peak) of the distribution to 
#the right and makes the distribution more symmetric.

#Beta = scale parameter
##The beta parameter stretches or compresses the distribution along the 
##horizontal axis. Larger values of beta spread the distribution over a 
##wider range, while smaller values compress it.
##Increasing beta shifts the peak to the right (values of d increase).

#Define the function
f_gamma <- function(d, alpha, beta) {
  (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))}

#The gamma values are probability density functions (= proportional to the 
#density of values around that point). For us the actual value is meaningless, 
#we will use proportions of the total probability over a given interval (=
#the integral) because the biological value we are trying to model is
#proportion of seed reaching a distance x
#Define a function that integrates the gamma function over the defined interval
#and normalize by this value
f_int_gamma <- function(alpha, beta) {
  integrate(f_gamma, lower = 0.1, upper = 20, 
            alpha = alpha, beta = beta)$value
}

#First step is to fit the gamma function to the seed dispersal data from 
#Smith (1966)
#Create an dataframe with distance values from 0-20 at 0.1m intervals
df <- tibble(dist_m = seq(0, 20, by=0.1))

#Get estimates of gamma distributions with different values for alpha and beta
#These are equivalent to the number of seeds per_m2 at each distance in the 
#Smith data
#Normalize the gamma estimates by the integral. This makes the estimates 
#equivalent to the proportion of seed from a given tree at a distance x
#I played with the alpha values to get them to match the shape of the Smith
#data and then added a scale factor (2.5) to get the proportion to be about
#equal
df <- df %>% 
  mutate(p_seed = f_gamma(d = dist_m, alpha = 2.5, beta = 1.25)*2.5/
           f_int_gamma(alpha = 2.5, beta = 1.25),
         source = "gamma")

#Plot this
ggplot(df, aes(x = dist_m, y = p_seed, color = source)) + 
  geom_point()

#Compare this to the values from Smith
#Calculate proportion seed at a distance (x) for Smith data
seed_total <- sum(smith$seed_m2)
smith <- smith %>% mutate(p_seed = seed_m2/seed_total)

#Subset to dist and p_seed and then bind to gamma dataframe
x <- smith %>% select(dist_m, p_seed) %>% mutate(source = "smith")
df <- rbind(df, x)

##Plot
ggplot(df, aes(x = dist_m, y = p_seed, colour = source)) + geom_point()

#Define function for seed dispersal based on this fitting
f_disp <- function(d) {
  gamma <- f_gamma(d, alpha = 2.5, beta = 1.25)
  scaled_gamma <- gamma*2.5
  p_seed <- scaled_gamma/f_int_gamma(alpha = 2.5, beta = 1.25)
  return(p_seed)
  }
#Example, proportion of seed produced deposited 5m away
f_disp(5) #compare with last plot, makes sense
## [1] 0.2205626
#Kept this code because its interesting, tells you what x value the max value 
#of a function occurs
#maximum = distance where max occurs, objective = corresponding gamma value
# optimize(gamma_function(d, alpha = 1, beta = 1), interval = c(0, 10), maximum = TRUE)

Playing with the gamma function

# Define the gamma function
gamma_function <- function(d, alpha = 4, beta = 1.7) {
  (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))}

#Define a vector of distances to sample from
x1 <- seq(0.1, 20, by=0.1)
#Randomly sample 200 distance values from that vector
d <- sample(x1, size = 200, replace = TRUE, prob = NULL)

#Relate distance to a variable with the gamma distribution
alpha <- 2
beta <- 1
gamma <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
plot(d, gamma) 

alpha <- 3
beta <- 1
gamma2 <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
points(d, gamma2, col="firebrick")

alpha <- 1
beta <- 1
gamma3 <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
points(d, gamma3, col="orange")

alpha <- 2
beta <- 1.5
gamma4 <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
points(d, gamma4, col="blue")

alpha <- 2
beta <- 0.5
gamma5 <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
points(d, gamma5, col="lightblue")

alpha <- 4
beta <- 1.7
gamma6 <- (d^(alpha - 1) * exp(-d / beta)) / (beta^alpha * gamma(alpha))
points(d, gamma6, col="green") #winner! a